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'Scratch | 
1 Ptd 1 


Cora . 1 
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| I* Control | 





MamoyBua 



FromJTAfl 
Ho«» Control 
■ndDtbug 




Pragnmmabl* 10 



S02-0taap» 
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F "2. 



CODE 

(real,imq) 


mapping 


result. real ■ 


result . img 


00 


+1,+1 


+r 


+i 


01 


+1,-1 


+i 


-r 


10 1 


-l.+l 


-i 


+r 


11 


-1, 1 


-r 


-i 



OPCODE 


Despreader 


4XDESP 


8XDESP 


lfiXCoirelatc 


mux 

negate 

unit 




C src 
bit 


C src 
bit 


C src bit- 


xoo 


TO . img 


c£o,l] 


c[0,l] 


C[0,1] 


xoi 


TO . img 


cE2,3] 


c[4,S] 


c[2,3] 


x02 


TO . img 


c[4,5] 


c[8,9] 


c[4,5] 


x03 


TO . img 


cf6,7] 


c [12,13] 


c [6, 7] 


X04 


TO .img 




c [2,3] 


C [8, 9] 


X05 


TO. img 


_ 


c[6,7] 


c [10, 11] 


XOS 


TO . img 


_ 


c [10,11] 


c [12, 13] 


x07 


TO . img 


- 


c [14,15] 


. C [14, 15] 


y08 


TO. real 


c[0,l] 


C[0,1] 


c[0,l] 


yua 


TO . real 


c[2,3] 


c[4,5] 


c[2,3] 


ylO 


TO . real 


C[4,5] 


c[8,9] 


c[4,5] 


yll 


TO. real 


c[6,7] 


C[12,13] 


c[6,7] 


yl2 


TO . real 




C[2,3] 


C[8,9] 


yl3 


TO . real 




c[6,7] 


c[10,ll] . 


yl4 


TO. real 




c[10,ll] 


c[12,13] 


yl5 


TO . real 




c[14,15] 


c[14,15] 


xl6 


Tl . img 


c[16,17] 


c[16,17] 


c[16,17] 


xl7 


Tl . img 


c[18,19] 


c[20,21] 


c[18,19] 


xl8 


Tl.img 


c[20,21] 


c[24,25] 


c[20,21] 


xl9 


Tl . img 


c[22,23] 


C[28,29] 


C[22,23] 


X20 


Tl . img 




C[18,19] 


C[24,25] 


X21 


Tl . img 




C[22,23] 


C[26,27] 


X22 


Tl . img 




c[26,27] 


c[28,29] 


X23 


Tl . img 




c[30,31] 


c[30,31] 


y24 


Tl.real 


c[16,17] 


c[16,17] 


c[16,17] 


y25 


Tl . real 


c[18,19] 


C [20,21] 


c[18,19] 


y26 


Tl . real 


c[20,21] 


c[24,25] 


C[20,21] 


y27 


Tl.real 


C[22,23] 


C [28,29] 


c[22,23] 


y2a 


Tl.real 




c [18,19] 


c[24,25] 


y29 


Tl.real 




c[22,23] 


C[26,27] 


y30 


Tl.real 




c [26,27] 


c[28,29] 


y31 


Tl . real 




c[30,31] 


c[30,31] 
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Despreadlag Implementation 1 

The diagram below implements a 4 chip despreader to two different CODE codes 



din 




16-bit implementation of despreading opcode 



CODE 


0[31:16] = 


O[15:0]= 1 


00 


-H— (I-Q) 


L— (I+Q) 


01 . ~~ 1 


-L=- (I+Q) 


H=(I-Q) 


10 


L= (I+Q) 


-H=-(I-Q) 


11 


H= (I-Q) 


L=(I+Q) . 1 



CODE(real,img) resultreal resultimg 
00 -> -1,-1 -(r-i) -( r + i) 
01->-l,l -(r+i) r-i 

10- > 1,-1 r + i -{r-i) 

11- > 1, 1 r-i r+i 



ft <o<JHG 2 



Function 


Output 


Function 


Despreader TreesO 


O0ri5:001 


real-i 


Despreader Treesl 


■O0[31:16] 


imaginary - q 


Despreader Trees2 


Oiri5:00] 


real-i 


Despreader Trees3 


Oir31:161 


imaginary - q 



direct 
input 



input mux 0 



N 



input mux 1 



% 



input mux 2 



K 



6 



V 

input mux 3 



iO [07:00] 



iO [23 :16] 



il[07:00] 



il[23:16]. 



i2 [07:00] 



i2 [23:16] 



i3 [07:00] 



i3 [23:16] 



i0[15:08] 



Xi£[31 



:24] 



il[15:08] 



il[31:24] 



i2 [15:08] 



12 [31:24] 



i3 [15:08] 



13 [31:24] 



Despread Tree 0 



O0[15:0] 




Despreader integration with input and Output muxes 





32-bit chain output is added with all zero in the 2x mult before being sent to output mux 1. 
2 32-bit packed outputs CO and CI are added together before being sent to output mux 0. 



mode 


code 


real result 


img result 


complex 


00 


real 


img 


complex 


01 


lmg 


-real 


complex 


10 


-img 


real 


.complex 






-real 


-img 


complex-cnj 




r 




real ■ 


img 


complex-cnj 


0 


?' 


1 


img 


-real 


complex-cnj 






0 


-img 


real 


complex-cnj 




o 


\ 


-real 


-img 


real-r* 


Ox 


real 




real-r 


lx 


-real 




real-i** 


X0 




img 


real-i 


xl 




-img 


zero 


XX 


real 


img 



* real mode selects the real input and uses code[l] to control negation for the 
real output 

** real mode select die img input and uses code[0] to control negation for the 
img output 
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DPU Rags 
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ft feme 
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MultmuxO 
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MultO 




MulG 


Multl 



Desp/CorrTreeO 



Desp/CorrTree 1 



Desp/CoirTree2 



Desp/CorrTree 3 



AdderO 



Adderl 



Adder2 



Adder3 



OutputmuxO 



Outputrauxl 



The A and 8 input muxes select from the following sets of 32-bit signals: 
•► 1 6 Local Interconnects (8 previous DPU/MULTs, 7 next, and DPU Output feedback) 

9 Global Vertical nets 

3 reserved 

ISM Read Data 
* LFSR feedback 
+■ ALU Output feedback 

Logical zero (32'hO) 
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8x Despreader/Correlator Enhanced Multiplier Opcode 

Conventions: 

■ Naming: 1-bit values , both real and imaginary one-bit values are referred to as codes, and sometimes 
as PN codes. The real part and imaginary parts of a complex value is known as {real-part, imaginary 
part}, { real, img }, {l,j} , and {I,Q}; with a preference (reahimg) aiid (I,Q) , <-~- 

■ Code mapping: we will adopt the convention where 0->l and l-> -.1. This allows us to treat the one bit 
code values as signs of 1 bit integers. This is compliant with CDMA2000 but contrary to some 
common usage. An XOR can be used to convert from this mapping to the opposite. 

■ Example the code 01 implies 0 for the real part and 1 for the imaginary part 

■ Real-img ordering: we will adopt the convention that the img part of a complex number is allocated to 
the LSB or little endian position. The motivation for this is to allow real on the left and img on the 
right when viewing 32-bit values as hex or binary displays 

■ Example the code 01 encodes 1-j; assuming img or 'j' is in lsb 

■ earliest -latest ordering: We will adopt the convention that earliest samples in time are assigned LSB 
slots. This is in line with naming samples in ascending order when written in time sequence. 
Example a time sequence of values on a port appears as - D0:D1 :D2:D3 

» 1-bit * 8-bit complex multiplier format: We will adopt the convention that we implement a 

mathematical complex multiply assuming the input sample are preformatted as real+img, real-img 
pairs. Other conventions such real multiplies, and complex conjugates of the imaginary part of the 
code will require additional preformatting of the input data, but may be implemented as opcode 
options. 

■ Example: {0,0}*{i,q}= {(I-q},(I+q) }; 



Additions: 

1-bit * 8-bit complex vector dot product definition: SUM(code[]*data[]) 

coraplex_l code [8] ; 
complex_8 data [8] ; 
complex_8 dotproduct; 
for (n=0;n<nelm;n++) 
{ 

dotproduct += code [n] * data [n] ; 

} 



2- CODE code encoding 



1-bit complex integer format — CODE.q, CODE.q 


CODE code value 


Numerical meaning 


0 


+1.0 


1 


-1.0 



3 CODE format 



1-bit complex integer format - CODEfl :0] 


Bit 


Numerical meaning 


CODEfl] 


1, 1 , or real part 


CODE[0] 


Q,j or imaginary part 



4 Complex 16-bit data input format 
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1 6-bit complex integer format - IN[3 1 :01 


Field 


Numerical meaning 


IN[31:16] 


I or real part 


ESrri5:001 


Q or imaginary part 


5- Complex 8-bit data input formatO 16-bit aligned 


8-bit complex integer format — IN 


"31:01 


Field 


Numerical meaning 


IN[23:16] 


I or real part 


IN[07:00] 


q or imaginary part 


6- Dual Complex 8-bit data input formatl 16-bit aligned 


Dual 8-bit complex integer format 


- IN[31:01 


Field 


Numerical meaning 


IN[31:24] 


11 or real part sample 1 


IN[15:08] 


q 1 or imaginary part, sample 1 


IN[23:16] 


10 or real part, sample 0 


IN[07:00] 


q0 or imaginary part sample 0 
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CODE formats 

Each CODE bit pair is used to drive a mux/negate unit connect to two input bytes 
I# refers to one of 4 input muxes, D# refers to registers in the delay chain 



z~\ ri f~\ r^i T" 


A "V""P\T? OTIO 

4Xi)-hSP8 


8XDESP8 


1 oXCorrelate 


Pa.ckxncj 
in 3 2- 
bit word 


U : 1U : U : qo 


11 : 10 :qi : qu 

* 


U : 1U : 0 : qGU^ - 


V-UUJCi D1C 


Data input 


Data input 




Data input 


u , ± 




m 7 - 1 s 1 th fn7 - nnl 

1U LZj.XDJ , 1U L U / - UUJ 


XU [1j . uo J f X U L«J / .UUJ 


0 "i. 

£ i j 


lll^J.lOJ , 11 |_U / « UU J 


1 U [jl <<dij / 1U I XD • UO j 




A C 
^ / 3 


to . i si to fm . nnl 
li . loj , 1Z t u / -UUJ 


ti roi . i si ti rn7*nni 

11 L^-i • lo J / 11 L u ' . u U j 


m fi - n n 1 m fn7 - nnl 
JJl lid . uo J / JJlLu/.Uuj 




T*^ r 9 . i si r~i rn7 - nnl 


11 LJl.Z'lJ , 11 [13 .UOJ 


no fi c . nai no Fn7 • nnl 

[1j . UO J / L) [U / .UUJ 


8, 9 




12 [23 :16] , 12 [07 :00] 


D3 [15 : 08] ,D3 [07 : 00] 


10 , 11 




12 [31:24] , 12 [15 :08] 


D4 [15 : 08] ,D4 [07 : 0 0] 


12 , 13 




13 [23 :16] , 13 [07 :00] 


DS [15:08] ,D5 [07:00] 


14, 15 




13 [31:24] , 13 [15 :08] 


D6 [15 : 08] ,D6 [07 : 00] 


- 16, 17 


10 [23 : 16] , 10 [07 : 00] 


10 [23 :16] , 10 [07 :00] 


D7 [15:08] ,D7 [07:00] 


18, 19 


11 [23 : 16] , 11 [07 : 00] 


10 [31:24] , 10 [15 :08] 


D8 [15 : 08] ,D8 [07 : 00] 


20,21 


12 [23 : 16] , 12 [07 : 00] 


11 [23 : 16] , 11 [07 : 00] 


D9 [15 : 08] , D9 [07 : 00] 


22, 23 


13 [23 :16] , 13 [07 : 00] 


11 [31:24] , 11 [15 :08] 


D10 [15 :08] ,D10 [07 :00] 


24, 25 




12 [23 :16] , 12 [07 :00] 


Dll[15:08] ,D11[07:00] 


26,27 




12 [31:24] , 12 [15 :08] 


D12 [15 : 08] ,D12 [07 : 00] 


28,29 




13 [23 :16] , 13 [07 :00] 


D13 [15:08] ,D13[07:00] 


30, 31 




13 [31 :24] , 13 [15 :08] 


D14 [15:08] , D14 [07:00] 
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Complex Vector Multiply units TO, T1 

Input data busses not shown 
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CODE multiply routing 

A mux is employed to route CODE bits to the correct CODE multiply unit (the mux-negate unit) with the 
truth table below. The following table summarizes the routing to 32 mux negate units as a function of 
opcode. The 16XADD and 16XASUB opcodes could also be implemented in the mux/neg block instead of 
the mux 



CODE 

(real , img) 


mapping 


result . real 


result . img 


00 


+1, +1 


+r 


+ i 


01 


+1, -1 


+i 


-r 


10 




-i 


+r 


11 


-1, 1 


-r 


-i 







Despreader 




8XDESP 


16XCorrelate 




mux 






C s r c 


c 


src 


c 


S3TC bit 










hi t- 


bit 








unit 
















iff 


xO 0 


TTl 
i. \J 


img 


c ro ii 


c 


rn ii 


c 


'O 1 1 


Ji ■ 


xO 1 


1 U 


img 


c T2 31 


c 


[A CI 


c 




ffi 


xO 2 


T" A 
I U 


img 




c 


' o a 1 
.a , 9 J 


c 


r A C 1 


O 

T»V-' 


AV J 


T" A 
1 U 


img 


CIO, / J 


c 


,1-S / 1-3 J 


c 






-y-n A 
AU *± 


a 
I 0 


img 




c 




c 


lB , 9 J 






T 1 A 
1 U 


img 




c 


[6,7] 


c 






xO 6 


TO 


img 




c 


[10,11] 








X07 


TO 


img 




c 


[14,15] 


c 


[14, 15] 




y08 


TO 


real 


C [0,1] 


c 


[0,1] 


c 


[0,1] 




y09 


TO 


real 


c [2, 3] 


c 


[4,5] 


c 


[2,3] 




ylO 


TO 


real 


c [4, S] 


c 


[8,9] 


c 


[4,5] 




yll 


TO 


real 


C [6,7] 


c 


[12,13] 


c 


[6,7] 




yl2 


TO 


real 




c 


[2,3] 


c 


[8,9] 




yl3 


TO 


real 




c 


[6,7] 


c 


[10, 11] 




yl4 


TO 


real 




c 


[10, 11] 


c 


[12, 13] 




yl5 


TO 


real 




c 


[14, 15] 


c 


[14, 15] 




xl6 


Tl 


img 


C [16, 17] 


c 


[16, 17] 


c 


[16, 17] 




X17 


Tl 


img 


G [13 , 19] 


C 


[20,21] 


c 


[18, 19] 




xl8 


Tl 


img 


C [20, 21] 


C 


[24, 25] 


c 


[20,21] 




x!9 


Tl 


img 


c [22 , 23] 


c 


[28,29] 


c 


[22, 23] 




x2 0 


Tl 


img 




c 


[18, 19] 


G 


[24, 25] 




x21 


Tl 


img 




c 


[22,23] 


G 


[26,27] 




x2 2 


Tl 


img 




c 


[26,27] 


C 


[28,29] 




x2 3 


Tl 


img 




c 


[30,31] 


c 


[30,31] 




y2 4 


Tl 


real 


C [IS, 17] 


c 


[IS, 17] 


c 


[16, 17] 




y25 


Tl 


real 


C [18, 19] 


c 


[20,21] 


c 


[18, 19] 




y2 6 


Tl 


real 


c [20 , 21] 


c 


[24 , 25] 


c 


[20,21] 




y2 7 


Tl 


real 


c [22, 23] 


c 


[28,29] 


c 


[22,23] 




y2 8 


Tl 


real 




G 


[18 , 19] 


c 


[24,25] 




y2 9 


Tl 


real 




c 


[22,23] 


G 


[26,27] 




y3 0 


Tl 


real 




c 


[26,27] 


C 


[28,29] 




y31 


Tl 


real 




c 


[30,31] 


c 


[30,31] 
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Multiply Unit 

CODE Multiply format 



The code multiply unit multiplies 2 complex values, a 1-bit value ? k;now 
a the code and an 8 -bit complex pair know as data. 

This leads to the table: 



CODE (real, img) result. real= 



00 -> 1, 1 

01 -> 1,-1 

10 -> -1, 1 

11 -> -1,-1 



result . img= 



data.r - data.i; 

data . r + data . i ; 

- data . r - data . i ; 

- data.r + data.i; 



data.r + data.q 

- data.r + data.q 
data.r - data.q 

- data.r - data.q 



For efficient implementation this is to be implemented in the 
despreader by: 

1} requiring the data to be preformatted as: data.r =r-i, data.i=r+i,- 
2) using a mux followed by a negate to implement the multiply as 
follows : 



CODE (real , img) result. real result. img 



00 -> 1, 1 

01 -> 1,-1 

10 -> -1, 1 

11 -> -1,-1 



r - ± 

r + i 
-(r + i) 
-<r - i) 



r + i 

-(r - i) 

r - i 

- (r + i) 



If a 45 degree rotation and scaling is allowed as is ok when pilot and 
data are decoded together, the pre-f ormatting can be dropped to yield 
the following function table: 



CODE (real, img) result. real result. img 

00 -> 1, 1 r i 

01 -> 1,-1 i -(r) 

10 -> -1, 1 -(i) r 

11 -> -1,-1 -(r) -(i) 

The real part is close to the UMTS encoding: 

bits UMTS 

00 + i 

01 + r 

10 - r 

11 - i 
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Mux-negate Options 

Besides complex multiply other modes of the mux negate units are useful 

These modes are 

complex - the normal multiply 

complex-conjugate - complement the imaginary part of the code before multiply . 
zero - force code to 00 to effect an adder 

real — use the real part of the code to negate the real part of the data and the img part of the code to negate 
the img part of the data. 



The following truth table would apply if we decided to implement these 
additional modes (assume data at mux is called real, img) 

assumming the 4 modes are adopted, we can use imput mux bits to for the source 
of the bits. 



mode 


code 


real result 


img result 


complex 


00 


real 


img 


complex 


01 


img 


-real 


complex 


10 


-img 


real 


..complex 


111 


-real 


-img 


complex-cnj 


oiaV 


real 


img 


complex- cnj 


otaj ) 


img 


-real 


complex-cnj 


liii* > 


-img 


real 


complex-cnj 


*oj. 


-real 


-img 


real-r* 


0X 


real 




real-r 


lx 


-real 




real-i** 


xO 




img 


real-i 


xl 




-img 


zero 


XX 


real 


img 



* real mode selects the real input and uses code[l] to control negation for the 
real output. 

** real mode select the img input and uses code[0] to control negation for the 
img output. 
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Despreading 

Despreading is used in rake receivers. The common case used in both 1XRTT and UMTS is to despread a 
symbol against 2 separate codes to recover pilot and data channels. The circuit below is used as a test case 
to evaluate despreading performance for a variety of architectures. It despreads 4 16-bit or 8 8-bit ?, "co"mplex 
input samples know as "chips" to form two complex results corresponding to the pilot and data outputs of a 
rake despreader. Each input is stored as 8-bit complex data which may be unpacked to 16-bit complex data. 
The input data is assumed to be stored in separate LSM memories, and is addressed in such a fashion as to 
read out a contiguous neighborhood of 4 samples separated by a 1 chip time delay. 

pilot_code 



LSMs 




^adr 
•adr 



10 




II 




adr 



12 



0- 



13 



data_code 
LSMs 




pilot_out 




data out 



® 



1-bit complex multiply 



Vermont architecture enhancements which increase the number of chips per tile are: 

■ the address generator 

" support for 8-bit unpacking 

• support for addsub 1 6 and subadd 1 6 instructions. 

Vermont plus architecture enhancements which increase the number of chips per tile are: 

■ Adder tree in 2X multipliers 

■ despreader opcode in enhanced Vermont 

The data storage format for input data is important for efficiency. In some cases increased performance can 
be obtained if data is stored in memory as 16-bit data or in a redundant form of 8-bit data. The effect of 
various hw options has been summarized in table 1 and the implementations summarized in table 2. 

Table 1 - Number of chips per tile for despreading data in memory against 2 CODE codes 
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function 


data type 


CS2112 


Vermont 


Vermont 
2Xmul 


Vermont 
SBBA 


Vermont 
EX 


despread 


8-bit* 


1 


1 . 4 


2.3 




4 


despread 


16-bit 


1 


1 . 75 


3 .5 




4 


despread 


2X8-bit** 


1.4 






3.5 


8 


corr 


8 -bit 


52 


64 


64 


64 


192 



+1 : -q : +q : +i ; 

**stores 8 -bit complex data in memory 'as 32 -bit word organized as: 
iO :q0 : il :ql, 
il :ql : i2 : q2 ; 



Table 2 - The table below detials the DPU usage for despreading modes used above: 



format 


chip 


dpuO 


dpul 


dpu2 


dpu3 


mult 


nDPU 
pilot 


nDPU 
data 


nDPU 
tot 


chip/ 
tile 


8-bit 


2112 


mem 


unpack 
negate 


swap 


tree 


- 


4 


3 


7 


1 


8-bit 


V 


mem , 
unpack 


negate 
swap 


tree 






3 


2 


c 


t A. 


8-bit 


V2x 


mem 
unpack 


negate 
swap 






tree 


2 


1 


3 


2.3 


8-bit 


Vex 


mem 
unpack 








codemu 
It 

tree 


1 


0 


1 


4 


16-bit 


2112 


mem 


negate 


swap 


tree 


tree 


4 


3 


7 


1 


16-bit 


V 


mem 

negate 

swap 


tree 








2 


2 


4 


1.75 


16-bit 


V2x 


mem 

negate 

swap 








tree 


1 


1 


2 


3.5 


16-bit 


Vex 


mem 








codemu 
It 

tree 


1 


0 


1 


4 


2X8bit 


2112 


mem 


negate 
swap 


tree 






3 


2 


5 


1.4 


2X8bit 


V 

sbba 


mem 
sbba 








tree 


2 


2 


4 


3 .5 


2X8bit 


Vex 


mem 








codemu 
It 

tree 


1 


0 


1 


8 



1XRTT / UMTS Rake Receiver channel count 

Based on the despreading performance and a 150 MHz clock for Vermont, the estimated 1XRTT and 
UMTS rake receiver channel count is summarized below : 





CS2112 
DPUs 


CS2112 

channels 


Vermont 
channels 


Vermont 

2Xmul 

channels 


Vermont 
EX 

channels 


1XRTT channels 


50 


50 


75 


100 


150-200 


UMTS channel 


32 


16 


24 


32 


32-48 
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Despreading Implementation 1 

The diagram below implements a 4 chip despreader to two different CODE codes 
din 



10 



II 



12 



13 




I+Q, 
I-Q 



H 



H 



H 



1 6-bit implementation of despreading opcode 



I,Q * CODE 



CODE 


O [31 : 16] = 


O [15 : 0] = 


00 


-H=- (I-Q) 


L=- (I+Q) 


01 


-L=- (I+Q) 


H=(I-Q) 


10 


L= (I+Q) 


-H=- (I-Q) 


11 


H= (I-Q) 


L=(I+Q) 



CODE(real,img) result.real result.img 



00 -> -1,-1 -(r-i) -(r + i) 

01 ->-l, 1 -(r + i) r-i 

10 -> 1,-1 r+i -(r-i) 

11 -> 1, 1 r-i r+i 
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Correlation circuit: 

The following circuits implement the same correlation function: 
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Despreader Trees without input delay 3 

A despreader tree can be constructed to implement dual 4-chip despreader for 16-bit data and a dual 8-cmp 
despread for 8-bit data. 4 despread trees are needed, one for each 16-bit output field. 



Function 


Output 


Function 


Despreader TreesO 


O0[15:001 


real - i 


Despreader Trees 1 


OOP 1:161 


imaginary - q , 


Despreader Trees2 


Ol[15:00] 


real - i 


Despreader Trees3 


01f31:161 


imaginary - q 



direct 
input 



input mux 0 



input mux 1 



N 



V 

input mux 2 



6 



-6 



V 

input mux 3 



io [07 : 00] 



iO [23 : 16] 



il[07:00] 



il [23 :1S] 



i2 [07 : 00] 



i2 [23 :16] 



i3 [07 : 00] 



i3 [23 : 16] 



i0 [15 :08] 



i0 [31:24] 



il [15 : 08] 



il [31:24] 



i2 [15 : 08] 



i2 [31:24] 



i3 [15 :08] 



i3 [31:24] 



Fig 2 - Despread Tree 0 




O0 [15-: 0 



10 ^ 11 




10 x — ^ 11 
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Adding a separate input mux and delay chain enables a dual correlation function to be implemented with 
only one external DPU. For this mode Output O0 is the sum of CO+C 1 ; 



Function 


Function 


Output - 
despread 


Output - 
correlation 


Chain input 


Chain output 


Despreader TreesO 


real - i 


O0[15:00] 


C0[15:00] 


I0[23:16],IO[7:O] 


chainf23:16U7:01 


Despreader Trees 1 


imaginary - q 


O0[31:16] 


C0[3 1:161 






Despreader Trees2 


real - i 


Ol[15:001 


Cl[15:00] 


chain[23:16],[7:01 


Ol[15:001 


Despreader Trees3 


imaginary - q 


01[31:161 


Cl[31:16] 
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direct 
input 



input mux 0 



input mux 1 



input mux 2 



N 



o 



6 



input mux 3 



il [07 : 00] 



il [23 : 16] 



i2 [07 :00] 



i2 [23 : 16] 



i3 [07 : 00] 



i3 [23 : 16] 



il[l5:08] 



il [31 : 24] 



i2 [15 : 08] 



i2 [31:24] 



i3 [15 :08] 



i3 [31:24] 



Fig 2 - Despread Tree 0 



chain source 


iO [07 -.00] 


\ 






i0[23:16] 













i0[15: 08] H 




iO [31:24] 














code [0,1,2, 




Output 



10 11 



<i> 




chain output 



12 
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Physical Layout 





elements per output 


total 


4:1 Muxes 8-bits 


8 


32 


pipe regs 8-bit 


16 


64 


XOR 8-bit 


8 


32 


adders 8-bit 


4 


16 


adders 9-bit 


2 


8 


adders 10-bit 


3 


12 


Total blocks 


352 


1408 


Total blocks in 4-: 1 mux and pipe 


192 


768 



Loads per input = 4 



1408 * 100 urn sq = .1408 mm sq 



Ol[31:16] 


O0[31:16] 


01[31:161 


O0[3 1:16] 


iO 


il 


i2 


L 13 




mOO 


iOl 


i02 


iOl 


mOl 








aO 


al 


a2 


a3 


a4 


a6 


a5 


a7 


mlO 








mil 








aO 


al 


a2 


a3 


a4 


a6 


a5 


a7 


m20 








m21 








aO 


al 


a2 


a3 


a4 


a6 


a5 


3.7 


m30 








m31 








aO 


al 


a2 


a3 


a4 


a6 


a5 


a7 



16 rows at 10 urn each?? 
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16-bit output of each 'real' despreader tree is packed with the corresponding 'imaginary' despreader 
output into one 32-bit output, such as output of TreeO is packed with output of Treel and Tree 2's is 
packed with Tree3's. 

The final add before the output mux is performed inside the 2x multiplier in 4 ADD 1 6(packed 1 6-bit 
addition) mode. 

A add-one signal decoded inside the despreader is used to determine the other operand of the final add 
the operand could either be zero or 2 packed 16-bit '0001'. 
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PN I— decode 




16-biC img 



000 1,000 1- 
0001,0000- 
0000,0001- 
0000,0000- 



2 16-bu packed 



Packed I6-btt 
add in 2x mult 



decode 




16-bit img 



0001,0001- 
0001,0000- 
0000,0001- 
OCOO.0000- 



2 1 6-bit packed 



0 



Packed 16-bit 
add in 2x mult 



32-bit chain output 



32-bit chain output is added with all zero in the 2x mult before being sent to output mux 1. 
2 32-bit packed outputs CO and CI are added together before being sent to output mux 0. 



A 5-bit opcode is used in the enhanced multiplier (both 2xmult and desp/corr) for decoding 9 modes m 
2xnmlt and 12 mode desp/corr as shown in the following table. 



Mode 


Bit[4] 


Bit(3] 


Bitf2] 


Bitfl] 


Bit[0] 


4xdesp8 complex 


1 


1 


1 


0 


0 
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4ynt a, c!'nR cotton pAninrr 

tAUUo|JO vVJIlifJ-LfUHf U)* 


— 


J 


f 

1 


u 


1 








1 
1 


1 
1 


0 







— 


1 
i 


1 
1 


1 




— 

— 


— 

J 


U 


u 


0 


SvnP<lTlS rAtYiT»-.r*nninn 


_ 


J 


u 


L) 


1 


OAucapo ZciO 


-J 


1 


0 


1 


— ; w 

0 


8xdesp8 real 


-i 




0 


1 


1 


Con complex 


-1 


A 


1 


0 


0 


Con comp-conjug 


-J 


0 


1 


0 


1 


Corr zero 


J 


0 


1 


1 


0 


Cqtt real 




u 


1 
1 


t 
i 


1 

I 


2MULT 


0 


0 


0 


0 


0 


4ADD32 


0 


0 


1 


0 


0 


4ADD16 


0 


0 


1 


0 


1 


4MULT 


0 




0 


0 


0 


4MULTSUM 


0 




0 


0 


1 


4MULT2SUM 


0 




0 


1 


0 


4FIR 


0 




1 


0 


0 


CMULT 


0 




1 


1 


0 


CMULT16 


0 




1 


1 


1 



There is an output register in each of the desp/corr tree. 

The pn code will be registered after the 2-to-l input scrambling mux. 

All desp/corr trees output would go to an adder in the 2xmult before going to the output mux. 
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2X Multiplier 
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1 Extended Multipltkr 



JL1 Purpose w 

The purpose of this extended multiplier, is to build more functionality into the existing 
multiplier, to optimize it for the expected customer applications. This will be accomplished, by 
adding 2 additional multipliers per tile, and 4 additional adders. These will all be added to the 
current multiplier block, in essence creating a large functional unit that is capable of significant 
processing. 

In order to reduce the overhead in this effort, it has been decided to use the same input muxes 
and output muxes as the current multipliers. This means that we have a block that has 4 * 32-bit 
inputs and 2 * 32-bit outputs. Thus in order to take advantage of having 4 multipliers, the inputs 
somehow have to be shared between them, and the outputs need to be shared as well. The 
sharing of the inputs is generally accomplished by packing the inputs into the high and low 
halves of the input words. The output sharing is accomplished by either accumulating the results 
through the new adders, or packing the results into the output registers. 

Finally, for backward compatibility, we need to provide for the case where the new features are 
bypassed. This is accomplished by making it so that the input and output muxes are selected in 
the same fashion for the old bits, and the new bits (when defaulted to 0) do not effect the circuit. 
The opcode is also designed so that the default case of all 0's causes the mutlipliers to behave as 
before. 
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2 Implementation 

The XMULT or "extended multiplier" builds on the existing multiplier block by maintaining the 
existing I/O structure. It adds 2 additional multipliers and 4 adders to create a structure which 
can be configured as for a variety of different functions. A simplified diagram of the multiplier 
is illustrated below, along with a table of the target opcodes. The wiring pattern illustrates a 
backwards compatible mode. Components in bold illustrate the existing multiplier. Each adder or 
multiplier has 2 input muxes and an optional output register. 




opcode 


latency 


function 


2MULT 


2 


current multiplier mode 


4ADD32 


3 


sum of 4 32-bit inputs 


4ADD16 


4 


sum of 4 sets of packed 16 bit inputs 


4MULT 


2 


4 independent multipliers with 16-bit packed inputs and outputs 


. 4MULTSUM 


4 


4 multipliers with 16-bit packed inputs and outputs summed together in tree 


4MULT2SUM 


3 


4 multipliers with 16-bit packed inputs and outputs summed together in tree 


4FIR 


N/A 


4 multipliers with 16-Bit packed coefficients, a single 16-Bit data input, a 
32-bit accumulation input and a 32-Bit outputs accumulated in cascade 
with pipeline registers between accumulators 


CMULT 


4 


16-Bit packed complex multiply with 32-Bit IQ accumulation input,output 


CMULT16 


3 


16-Bit packed complex multiply with FFT butterfly adders with complex 
input, output 
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OPERATOR MUX selects by OPCODE 



MUX 


II 


OPCODE 






2MULT 


4ADD32 


4ADD16 


4MULT 


4MULT+ 


4MULT2 
+ 


4FIR 


CMULT1 
6 


CMULT 


multO-a 




i0[31 :16] 






i0[31:16] 


i0[31:16] 


i0[31:16] 


i2[15:0] 


i0[31:1&p 


i0[31:16] 






i0[1 5:0] 


















muitO-b 




i1[31:8] 






M [31:16] 


i 1 [3 1 : 1 6] 


M [31:16] 


i1[31:16] 


i1 [31:16] 


i1[31:16] 






i1 [31:16] 






















i1 [15:0] 


















mult1-a 










i0[15:0] 


i0[15:0] 


i0[1 5:0] 


i2[1 5:0] 


i0[1 5:0] 


i0[1 5:0] 


muit1-b 










i1[15:0] 


i1[15:0] 


i1 [1 5:0] 


i1 [1 5:0] 


M [1 5:0] 


i1 [1 5:0] 


mult2-a 




i2[31:16] 






i2[31:16] 


i2[31:16] 


i2[31:16] 


i2[15:0] 


i0[31:16] 


i0[31:16] 






i2[15:0] 


















mult2-b 




i3[31:8] 






i3[31:16] 


i3[31:16] 


i3[31:16] 


i3[31:16] 


i1 [15:0] 


M [1 5:0] 






i3[31 :1 6] 






















i3[1 5:0] 


















mult3-a 










i2[15:0] 


i2[15:0] 


i2[15:0] 


i2[1 5:0] 


i0[1 5:0] 


i0[1 5:0] 


mult3-b 










i3[15:0] 


i3[1 5:0] 


i3[15:0] 


i3[15:0] 


i1[31:16] 


i1 [31 :16] 


addO-a 


2 




i0[31 


0] 


i0[31: 


0] 




m0[31:0] 


m0[31 


0] 


i0[31:0] 


m0[31:0] 


m0[31 :0] 


addO-b 


3 




i1[31 


0] 


i1[31: 


0] 




m1[31:0] 


m1[31 


0] 


m1[31:0] 


-ml + 1 


~m1 + 1 


add1-a 


3 




i2[31 


0] 


i2[31 : 


0] 




m2[31:0] 


m2[31 


0] 


a2[31:0] 


m2[31:0] 


m2[31:0] 


add1-b 


2 




i3[31 


0] 


i3[31: 


0] 




m3[31:0] 


m3[31 


0] 


m3[31:0] 


m3[31:0] 


m3[31:0] 


add2-a 


2 




a0[31:0] 


a0[31 


:0] 




a0[31:0] 




m0[31:0] 




a0[31:0] 


add2-b 


3 




a1[31:0] 


a1[31 


:0] 




a1[31:0] 




a0[31:0] 




i2[31:0] 


add3-a 


3 






a2[31 


:16] 








a1[31:0] 


32'hO 


a1[31:0] 


add3-b 








a2[15 


:0] 








m2[31:0] 


i3[31:0] 


i3[31:0] 


omuxO 


6 


m0[31:0] 


a2[31:0] 


a2[31 


:0] 


m0[31:16] 


a2[31:0] 


a0[31:0] 




a0[31:16] 


a2[31:0] 












m1[31:16] 








a1[31:16] 




om ux1 


6 


m2[31:0] 


a2[Cout] 


a3[31:0] 


m2[31:16] 


a2[Cout] 


a1[31:0] 


a3[31:0] 


a3[31:0] 


a3[31:0] 










m3[31:16] 













i0-i3 = input mux 
m0-m3=mults 
a0-a3=adder 
-a = operand a 
-b= operand b 



Carry of 32-bit ADD operation is not brought out unless explicitly specified in this document. 
The precision of the ADD operation right after the multipliers is not lost due to the duplicate sign 
bit in the result of the multipliers. For any other additions, it the user's responsibility to avoid the 
event of a overflow. The user might also use the shift-down operation in the result of the adders 
to reduce the loss of precision. 
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2.1 Bit file encoding 

There are 10 bits in the CSMMULT that are not used. These are 'RESERVED' fields, as well as 
the mult_h Ism wen, and the multji Ism dynamic mode bits. The wen is not used, as Jsm3 is 
never used as a write Ism unless it is connected to at least one other Ism. The write enable is 
routed with the write address, and the multiplier cannot generate the write address. The dynamic 
mode bits are routed with the write data, and the multiplier cannot generate write data. Thus, 
neither of these fields is meaningful in the CSMMULT. 

The multiplier a input mux select is named muxafghsel, which is currently a 1 bit field. We will 
extend this to 2 bits for both mutlipliers, at the cost of 2 CSM bits. The output select is named 
muxmultlsmsel, which is currently a 2 bit field. We will extend this to 3 bits for both 
mutlipliers, at the cost of 2 CSM bits. We will add a 5 bit opcode, which will essentially be 
shared, which will therefore cost 5 CSM bits. One of the remaining 2 bits will be used to 
selectively shift all of the multiplier results up by one bits, thereby normalizing off the redundant 
sign bit. The other remaining bit will be used to selectively shift the adder outputs down by one 
bit, in order to normalize the adder results. Thus, all of the available CSMMULT bits will be 
utilized by the new design. 
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Opcodes (new 4 bit CSMMULT field) 



Name 


Bit 4 


Bit 3 


Bit 2 


Bit 1 


BitO 


Multbypass[3:01 


Adder s16bitf3:0] 


Adder bypassf3:01 


2MULT 


0 


0 


0 


0 


0 


x1x1 


Xxxx 


xxxx 


4ADD32 


0 


0 


1 


0 


0 


xxxx 


xOOO 


1100 


4ADD16 


0 


0 


1 


0 


1 


xxxx 


1111 


1000 


4MULT 


0 


1 


0 


0 


0 


0000 


Xxxx 


xxxx 


4MULTSUM 


0 


1 


0 


0 


1 


0000 


xOOO 


x100 


4MULT2SUM 


0 


1 


0 


1 


0 


0000 


xxOO | 


1111 


4FIR 


0 


1 


1 


0 


0 


0000 


0000 


0000 


CMULT 


0 


1 


1 


1 


0 


0000 


0000 


1100 


CMULT16 


0 


1 


1 


1 


1 


0000 


0x11 


0x11 



Input Muxes (2 new CSMMULT bits) 



Mux / Select 


0 


1 


2 


3 


multO-a 


t0[1 5:0] 


i0[31:16l 


i2[15:0] 


16'hO 


multO-b 


h [15:01 


i1 [31:1 61 


i1 [31:81 


24'hO 


-multi -a (multO-a) 


i0[31:16] 


i0[15:0l 


i2[15:0] 


16'hO 


mult1-b (multO-b) 


i1 [31:161 


i1 [15:01 


iir31:16l 


16'hO 


mult2-a 


16'hO 


i2[31:16l 


i0[31:16] 


i2[1 5:01 


mult2-b 


16'hO 


i3[31:16] 


i3[31:8] 


i1 [15:01 


mult3-a (mult2-a) 


16'hO 


i2[15:0] 


i0n5:0l 


i2[15:0l 


mult3-b (mult2-b) 


16'hO 


i3[15:0l 


16'hO 


i1 [31:1 61 



Note that the multl and mult3 a and b operand muxes are derived from the multO and mult2 a 
and b operand muxes respectively. For the 16 bit high low selects, it is useful to note that 
selecting the high part of the word for multO selects the low part of the same word for multO. 
This is true for the low bits of all of the select fields, but the high bits are reserved for the more 
special cases, such as 24*16 multiplies, the FIR opcode, and the CMULT opcode inputs. 



Mux input for adders (controls are decoded by opcodes) 



Mux / Select 


0 


1 


2 


3 


addO-a 


i0[31: 


01 


m0[31:0l 


32'hO 


desp0[31 :01 


addO-b 


i1[31: 


01 


ml [31:01 


~m1 [31:01+1 


32'hO 


add1-a 


i2[31: 


0] 


m2[31 :0] 


a2[31:0l 


despl [31:01 


add1-b 


i3[31: 


161 


m3[31:0l 


32'hO 


32'hO 


add2-a 


aO[31 


:0] 


m0[31:0l 


32'hO 


desp2[31:0l 


add2-b 


a1f31 


:0] 


a0[31 :0l 


i2[31:0l 


32'hO 


add3-a 


a2r31 


:161 


a1 [31:01 


32'hO 


desp3r31:0l 


add3-b 


a2[15 


:0l 


m2[31:0l 


i3[31:0l 


32'hO 



Mux / Select 


0 


1 


2 


3 


4 


5 


6 


7 


omuxO (mult-h) 


m0[31:0l 


Ismval 


i0[31:0] 


i1 [31:01 


a2[31:0l 


{m0,m1} 


a0[31:0l 


{aO.al} 


omuxl (mult-l) 


m2f31:0l 


Ismval 


i2[31:0l 


i3f31:0l 


a2fcoutl 


{m2,m3} 


a1 [31:01 


a3[31:0l 
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Mult Shift Up (1 new CSMMULT bit) 

Selectively causes the output of the multipliers to be shifted up by one bit for normalization. The 
LSB is then connected, to Gnd. 



Adder Shift Down (1 new CSMMULT bit) 

Selectively causes the output of the adders to be shifted down by one bit for normalization. 

Input Mux (1 new population on the interconnect input mux) 

The current input mux supports the constants 0 and -1 . It is proposed to add the constant 
0x00010001 to allow selective multiplication by 0, 1 and -1. 



2,2 Layout Floorplan 



InputmuxO 



Inputmuxl 



Multmux2 



Multmux3 



Mult2 



Mult3 



AdderO 



MultmuxO 



Multmuxl 



MultO 



Multl 



Adder 1 



Adder2 



Adder3 



OutputmuxO 



Outputmuxl 
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2MULT - CS2112 Compatible mode 2 independent multipliers 



a 
b 

c 
d 




oO 




I * 



ol 



inpO 



OmuxO 



multO 




addO 




+ 



addl 



add2 

■ 




add3 
^ — 




Omuxl 
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4ADD32 - Sum of 4 32-bit inputs 



a 
b 

c 
d 



+ 




+ 



+ 




+ 



-j- ol 




+ 



-|— oO 
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w - 



inpO 



inpl 



inp2 



inp3 



OmuxO 



multO 



multl 



mult2 



mult3 



0®' 






addl / 
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4ADD16 - Sum of 4 packed 16-bit inputs, sum of upper, lower 16-bits 



a 
b 

c 
d 




H 



-j— oO 

+ H- 01 




inpO 



inpl 



inp2 



inp3 



multO 



OS) 



multl 

D3 



mult2 



mult3 
* )l 




addO 
K 




+ 



add2 



addl 
— 





+ 



H 



OmuxO 



Omuxl 



add3 

£3 
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c 
d 



H 



H 




L 




H 



H 





oOH 



'oOL 



-olH 



* ) — I ol L 



inpO 



inpl 



inp2 



jpz 



inp3 



H 



H 



multO 





H 



H 



multl 
mult2 
mult3 





OmuxO 



mult2f3 1:161 



mult2r31:161 



I 

+ 



OregO 



OmuxO 



mult2[31:16] 



mult3[31:16] 



+ 



Oregl 
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H 




i— 01 

+ V4- 00 
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W - 
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H 




L * 



L 

H( * 



H 




(*>+ 





00, 



01 



w - 



inpO 



OmuxO 
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cH 
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+ HK + h-H + H- 01 




OmuxO 
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CMULT16 - Complex Multiplier with 16-Bit Packed data, and indepent delay path. 
Assumes real part in High 16-bits, imaginary in Low 16-bits 



a 
b 



H 



H' 





H 




L * 




H * 





H 



O0 



»16 



Ol 



inpO 



OmuxO 
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CMULT - 32-bit output complex multiply with 32-Bit accumulation input, Assumes real 
part in High 16-bits, imaginary in Low 16-bits 



a + 
b -4 




+ 



H 




Hi * K 




t^ 4 u 





+ V- f- o0 



OmuxO 
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16-bit output of each 'real' despreader tree is packed with the corresponding 'imaginary' despreader 
output into one 32-bit output, such as output of TreeO is packed with output of Treel and Tree 2's is 
packed with Tree3 's. 

The final add before the output mux is performed inside the 2x multiplier in 4ADD16(packed 16-bit 
addition) mode. 

A add-one signal decoded inside the despreader is used to determine the other operand of the final add. 
The operand could either be zero or 2 packed 16-bit '0001'. 



Chameleon Systems Confidential 



Page 17 



HF 5 * f 



Chameleon 

ST STEM t , I H i. 



Vermont Despreader / Correlator 
Specification 



Correlator integration with input and Output muxes 



Document 
Control No. 



01-003 



Revision 1.1 



decode 




PN ) decode 



16 -bit irag 



0001,0001- 
0001,0000- 
0000,0001- 
0000,0000- 



32/ 



2 16-bit packed 
y= 



Packed 16-brt 
add in 2x mult 




0001,0001- 
0001,0000- 
0000,0001- 



1 16-bit packed 
it 



Packed 16-bit 
add m 2x mult 



32-bit chain output 



32-bit chain output is added with all zero in the 2x mult before being sent to output mux 1 . 
2 32-bit packed outputs CO and CI are added together before being sent to output mux 0. 
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A 5-bit opcode is used in the enhanced multiplier (both 2xmult and desp/corr) for decoding 9 modes in 
2xmult and 12 mode desp/corr as shown in the following table. 



Mode 


Bitr4] 


Bit[31 


Bit[2] 


Bit[l] 


Bitroi 


4xdesp8 complex 


1 


1 


1 


0 


0 


4xdesp8 comp-conjug 


1 


1 


1 


0 


1 


4xdesp8 zero 


1 


1 


1 


1 


0 


4xdesp8 real 


1 


1 


1 


1 


1 


8xdesp8 complex 


1 


1 


0 


0 


0 


8xdesp8 comp-conjug 


1 


1 


0 


0 


1 


8xdesp8 zero 


1 


1 


0 


1 


0 


8xdesp8 real 


1 


1 


0 


1 


1 


Corr complex 




0 


1 


0 


0 


Corr comp-conjug 




0 


1 


0 


1 


Corr zero 




0 


1 


1 


0 


Corr real 




0 


1 


1 


1 


2MULT 


0 


0 


0 


0 


0 


4ADD32 


0 


0 


1 


0 


0 


4ADD16 


0 


0 


1 


0 


1 


4MULT 


0 




0 


0 


0 


4MULTSUM 


0 1 




0 


0 


1 


4MULT2SUM 


0 




0 


1 


0 


4FIR 1 


0 




1 


0 


0 


CMULT 


0 




1 


1 


0 


CMULT16 


0 




1 


1 


1 1 



There is an output register in each of the desp/corr tree. 

The pn code will be registered after the 2-to-l input scrambling mux. 

All desp/corr trees output would go to an adder in the 2xmult before going to the output mux. 
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InputrnuxO 



Inputmuxl 



Multmux2 



Multmux3 



Mult2 



Mult3 



MulrmuxO 



Multmuxl 



MultO 



Multl 



Desp/Corr Tree 0 



Desp/Corr Tree 1 



Desp/Corr Tree 2 



Desp/Corr Tree 3 



AdderO 



Addeil 



Adder2 



Adder3 



OutputmuxO 



Outputmuxl 
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